INFO-I 590: Data Visualization: A comparative Analysis of Global Health Care Facilities: WASH Services Access Analysis¶

Under the guidance of Prof. Haewoon Kwak

Team Members:

  1. Bhargavi Vasudev Jahagirdar
  2. Naman Vipul Shah
  3. Sakshi Gatyan
  4. Palavi Dhanaji Patil
  5. Vaishanvi Sachin Shastri

Dataset Details: The dataset contains the following columns:

  • ISO3: country codes (String)
  • Country: country names (String/ Categorical)
  • Residence/Facility Type: government, rural, urban, hospital, non-hospital, non-government, total (String/ Categorical)
  • Service Type: water, sanitation, hygiene, environmental cleaning healthcare waste (String/ Categorical)
  • Year: year of the data collection (DateTime.year)
  • Coverage: percentage of coverage of managed healthcare waste services (Numeric)
  • Population: population of the country (Numeric)
  • Service level: status of services available: limited service, no service, basic service (String/ Categorical)

Data Preprocessing¶

For this project, we will be using a subset of the WASH dataset, which contains data from the years 2019, 2020, 2021, and 2022. This dataset has been merged with another dataset that includes GDP information for each country in the WASH dataset, enabling a comparative study against GDP. Below is the process followed to prepare the final dataset for data visualization.

In [1]:
!pip install jupyter-dash
Requirement already satisfied: jupyter-dash in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (0.4.2)
Requirement already satisfied: dash in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from jupyter-dash) (2.18.2)
Requirement already satisfied: requests in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from jupyter-dash) (2.31.0)
Requirement already satisfied: flask in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from jupyter-dash) (2.2.2)
Requirement already satisfied: retrying in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from jupyter-dash) (1.3.4)
Requirement already satisfied: ipython in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from jupyter-dash) (8.12.0)
Requirement already satisfied: ipykernel in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from jupyter-dash) (6.19.2)
Requirement already satisfied: ansi2html in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from jupyter-dash) (1.9.2)
Requirement already satisfied: nest-asyncio in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from jupyter-dash) (1.5.6)
Requirement already satisfied: Werkzeug<3.1 in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from dash->jupyter-dash) (2.2.3)
Requirement already satisfied: plotly>=5.0.0 in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from dash->jupyter-dash) (5.9.0)
Requirement already satisfied: dash-html-components==2.0.0 in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from dash->jupyter-dash) (2.0.0)
Requirement already satisfied: dash-core-components==2.0.0 in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from dash->jupyter-dash) (2.0.0)
Requirement already satisfied: dash-table==5.0.0 in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from dash->jupyter-dash) (5.0.0)
Requirement already satisfied: importlib-metadata in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from dash->jupyter-dash) (6.0.0)
Requirement already satisfied: typing-extensions>=4.1.1 in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from dash->jupyter-dash) (4.12.2)
Requirement already satisfied: setuptools in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from dash->jupyter-dash) (68.0.0)
Requirement already satisfied: Jinja2>=3.0 in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from flask->jupyter-dash) (3.1.2)
Requirement already satisfied: itsdangerous>=2.0 in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from flask->jupyter-dash) (2.0.1)
Requirement already satisfied: click>=8.0 in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from flask->jupyter-dash) (8.0.4)
Requirement already satisfied: comm>=0.1.1 in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from ipykernel->jupyter-dash) (0.1.2)
Requirement already satisfied: debugpy>=1.0 in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from ipykernel->jupyter-dash) (1.6.7)
Requirement already satisfied: jupyter-client>=6.1.12 in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from ipykernel->jupyter-dash) (7.4.9)
Requirement already satisfied: matplotlib-inline>=0.1 in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from ipykernel->jupyter-dash) (0.1.6)
Requirement already satisfied: packaging in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from ipykernel->jupyter-dash) (23.0)
Requirement already satisfied: psutil in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from ipykernel->jupyter-dash) (5.9.0)
Requirement already satisfied: pyzmq>=17 in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from ipykernel->jupyter-dash) (23.2.0)
Requirement already satisfied: tornado>=6.1 in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from ipykernel->jupyter-dash) (6.3.2)
Requirement already satisfied: traitlets>=5.4.0 in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from ipykernel->jupyter-dash) (5.7.1)
Requirement already satisfied: backcall in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from ipython->jupyter-dash) (0.2.0)
Requirement already satisfied: decorator in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from ipython->jupyter-dash) (5.1.1)
Requirement already satisfied: jedi>=0.16 in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from ipython->jupyter-dash) (0.18.1)
Requirement already satisfied: pickleshare in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from ipython->jupyter-dash) (0.7.5)
Requirement already satisfied: prompt-toolkit!=3.0.37,<3.1.0,>=3.0.30 in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from ipython->jupyter-dash) (3.0.36)
Requirement already satisfied: pygments>=2.4.0 in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from ipython->jupyter-dash) (2.15.1)
Requirement already satisfied: stack-data in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from ipython->jupyter-dash) (0.2.0)
Requirement already satisfied: colorama in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from ipython->jupyter-dash) (0.4.6)
Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from requests->jupyter-dash) (2.0.4)
Requirement already satisfied: idna<4,>=2.5 in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from requests->jupyter-dash) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from requests->jupyter-dash) (1.26.16)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from requests->jupyter-dash) (2023.7.22)
Requirement already satisfied: six>=1.7.0 in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from retrying->jupyter-dash) (1.16.0)
Requirement already satisfied: parso<0.9.0,>=0.8.0 in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from jedi>=0.16->ipython->jupyter-dash) (0.8.3)
Requirement already satisfied: MarkupSafe>=2.0 in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from Jinja2>=3.0->flask->jupyter-dash) (2.1.1)
Requirement already satisfied: entrypoints in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from jupyter-client>=6.1.12->ipykernel->jupyter-dash) (0.4)
Requirement already satisfied: jupyter-core>=4.9.2 in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from jupyter-client>=6.1.12->ipykernel->jupyter-dash) (5.3.0)
Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from jupyter-client>=6.1.12->ipykernel->jupyter-dash) (2.8.2)
Requirement already satisfied: tenacity>=6.2.0 in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from plotly>=5.0.0->dash->jupyter-dash) (8.2.2)
Requirement already satisfied: wcwidth in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from prompt-toolkit!=3.0.37,<3.1.0,>=3.0.30->ipython->jupyter-dash) (0.2.5)
Requirement already satisfied: zipp>=0.5 in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from importlib-metadata->dash->jupyter-dash) (3.11.0)
Requirement already satisfied: executing in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from stack-data->ipython->jupyter-dash) (0.8.3)
Requirement already satisfied: asttokens in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from stack-data->ipython->jupyter-dash) (2.0.5)
Requirement already satisfied: pure-eval in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from stack-data->ipython->jupyter-dash) (0.2.2)
Requirement already satisfied: platformdirs>=2.5 in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from jupyter-core>=4.9.2->jupyter-client>=6.1.12->ipykernel->jupyter-dash) (2.5.2)
Requirement already satisfied: pywin32>=300 in c:\users\bhargavi jahagirdar\anaconda3\lib\site-packages (from jupyter-core>=4.9.2->jupyter-client>=6.1.12->ipykernel->jupyter-dash) (305.1)
In [2]:
import pandas as pd
In [3]:
df_gdp = pd.read_excel('reshaped_country_timeseries.xlsx')
df_gdp.head()
Out[3]:
Country Name Code Indicator Name Year GDP per capita (current US$)
0 Afghanistan AFG GDP per capita (current US$) 2019 500.522981
1 Afghanistan AFG GDP per capita (current US$) 2020 516.866797
2 Afghanistan AFG GDP per capita (current US$) 2021 363.674087
3 Afghanistan AFG GDP per capita (current US$) 2022 353.000000
4 Albania ALB GDP per capita (current US$) 2019 5396.214227
In [4]:
df_final = pd.read_excel('final.xlsx')
df_final.head()
Out[4]:
ISO3 Country Residence / Facility Type Service Type Year Coverage Population Service level
0 AFG Afghanistan total Environmental cleaning 2019 84.00000 3.172638e+07 Basic service
1 AFG Afghanistan hospital Environmental cleaning 2019 79.11322 2.988067e+07 Basic service
2 AFG Afghanistan non_hospital Environmental cleaning 2019 81.84787 3.091353e+07 Basic service
3 AFG Afghanistan hospital Hygiene 2019 28.72340 1.084868e+07 Basic service
4 AFG Afghanistan total Sanitation 2019 2.50000 9.442375e+05 Basic service
In [5]:
df_gdp.rename(columns={'Code':'ISO3'} , inplace=True)
In [6]:
df_gdp.head()
Out[6]:
Country Name ISO3 Indicator Name Year GDP per capita (current US$)
0 Afghanistan AFG GDP per capita (current US$) 2019 500.522981
1 Afghanistan AFG GDP per capita (current US$) 2020 516.866797
2 Afghanistan AFG GDP per capita (current US$) 2021 363.674087
3 Afghanistan AFG GDP per capita (current US$) 2022 353.000000
4 Albania ALB GDP per capita (current US$) 2019 5396.214227
In [7]:
final_dataset = pd.merge(df_final,df_gdp,on=['ISO3', 'Year'], how='inner')

final_dataset.head()
Out[7]:
ISO3 Country Residence / Facility Type Service Type Year Coverage Population Service level Country Name Indicator Name GDP per capita (current US$)
0 AFG Afghanistan total Environmental cleaning 2019 84.00000 3.172638e+07 Basic service Afghanistan GDP per capita (current US$) 500.522981
1 AFG Afghanistan hospital Environmental cleaning 2019 79.11322 2.988067e+07 Basic service Afghanistan GDP per capita (current US$) 500.522981
2 AFG Afghanistan non_hospital Environmental cleaning 2019 81.84787 3.091353e+07 Basic service Afghanistan GDP per capita (current US$) 500.522981
3 AFG Afghanistan hospital Hygiene 2019 28.72340 1.084868e+07 Basic service Afghanistan GDP per capita (current US$) 500.522981
4 AFG Afghanistan total Sanitation 2019 2.50000 9.442375e+05 Basic service Afghanistan GDP per capita (current US$) 500.522981
In [8]:
final_dataset = final_dataset.drop(['Country Name'], axis=1)

final_dataset.head()
Out[8]:
ISO3 Country Residence / Facility Type Service Type Year Coverage Population Service level Indicator Name GDP per capita (current US$)
0 AFG Afghanistan total Environmental cleaning 2019 84.00000 3.172638e+07 Basic service GDP per capita (current US$) 500.522981
1 AFG Afghanistan hospital Environmental cleaning 2019 79.11322 2.988067e+07 Basic service GDP per capita (current US$) 500.522981
2 AFG Afghanistan non_hospital Environmental cleaning 2019 81.84787 3.091353e+07 Basic service GDP per capita (current US$) 500.522981
3 AFG Afghanistan hospital Hygiene 2019 28.72340 1.084868e+07 Basic service GDP per capita (current US$) 500.522981
4 AFG Afghanistan total Sanitation 2019 2.50000 9.442375e+05 Basic service GDP per capita (current US$) 500.522981
In [9]:
final_dataset.shape

final_dataset.info()

final_dataset.nunique()

print(final_dataset['Coverage'].describe())
print(final_dataset['Service Type'].value_counts())
<class 'pandas.core.frame.DataFrame'>
Int64Index: 40636 entries, 0 to 40635
Data columns (total 10 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   ISO3                          40636 non-null  object 
 1   Country                       40636 non-null  object 
 2   Residence / Facility Type     40636 non-null  object 
 3   Service Type                  40636 non-null  object 
 4   Year                          40636 non-null  int64  
 5   Coverage                      40636 non-null  float64
 6   Population                    40636 non-null  float64
 7   Service level                 40636 non-null  object 
 8   Indicator Name                40636 non-null  object 
 9   GDP per capita (current US$)  40636 non-null  float64
dtypes: float64(3), int64(1), object(6)
memory usage: 3.4+ MB
count    40636.000000
mean        54.434492
std         43.730914
min          0.000000
25%          2.849000
50%         60.274000
75%        100.000000
max        100.000000
Name: Coverage, dtype: float64
Water                     9296
Health care waste         8370
Sanitation                8326
Hygiene                   7662
Environmental cleaning    6982
Name: Service Type, dtype: int64

How does the coverage of different service types differ between rural and urban areas within a country?¶

Let's first analyze the global trend for the coverage of different service types across different facility types

In [10]:
import plotly.graph_objects as go

year_to_visualize = [2019, 2020, 2021, 2022]
combined_data = final_dataset[final_dataset['Year'].isin(year_to_visualize)].groupby(['Year', 'Residence / Facility Type']).agg({'Coverage': 'mean'}).reset_index()

pivot_data = combined_data.pivot(index='Residence / Facility Type', columns='Year', values='Coverage')

for year in year_to_visualize:
    pivot_data[f'{year}_Diffs'] = [
        ", ".join(
            f"{year_prev}: {pivot_data[year][i] - pivot_data[year_prev][i]:.2f}"
            for year_prev in year_to_visualize if year_prev < year
        )
        if year > year_to_visualize[0] else "N/A"
        for i in range(len(pivot_data))
    ]

base_color = 'Blues'

fig = go.Figure()

x = pivot_data.index

for i, year in enumerate(year_to_visualize):

    tooltip_text = [
        f"Year: {year}<br>Coverage: {pivot_data[year][facility]:.2f}%<br>"
        f"Differences from preceding years: {pivot_data[f'{year}_Diffs'][facility]}"
        for facility in pivot_data.index
    ]

    fig.add_trace(go.Bar(
        x=x,
        y=pivot_data[year],
        name=str(year),
        hoverinfo="text",
        hovertext=tooltip_text,
        marker=dict(color=i / len(year_to_visualize), colorscale=base_color)
    ))


fig.update_layout(
    title="Coverage of Residence / Facility Type by Year (with Differences from Preceding Years)",
    xaxis_title="Residence / Facility Type",
    yaxis_title="Coverage (%)",
    barmode="group",
    xaxis_tickangle=-45,
    legend_title="Year",
    template="plotly_white"
)

fig.show()
governmenthospitalnon_governmentnon_hospitalruraltotalurban010203040506070
Year2019202020212022Coverage of Residence / Facility Type by Year (with Differences from Preceding Years)Residence / Facility TypeCoverage (%)
plotly-logomark

Let's check the trends for specific countries in specific years

In [11]:
from jupyter_dash import JupyterDash
from dash import dcc, html, Input, Output

JupyterDash._server_threads.clear()
In [12]:
JupyterDash._server_threads.clear()

countries = [{'label': country, 'value': country} for country in final_dataset['Country'].unique()]
years = [{'label': year, 'value': year} for year in final_dataset['Year'].unique()]

app1 = JupyterDash(name='ResidenceCoverage')

app1.layout = html.Div([
    html.H1("Coverage of Residence / Facility Type by Year", style={'textAlign': 'center'}),

    html.Div([
        dcc.Dropdown(
            id='country-dropdown-1',
            options=countries,
            placeholder="Select a Country",
            style={'width': '48%', 'display': 'inline-block', 'margin-right': '2%'}
        ),
        dcc.Dropdown(
            id='year-dropdown-1',
            options=years,
            placeholder="Select Year(s)",
            multi=True,
            style={'width': '48%', 'display': 'inline-block'}
        )
    ]),

    dcc.Graph(id='residence-graph')
])

@app1.callback(
    Output('residence-graph', 'figure'),
    [Input('country-dropdown-1', 'value'),
     Input('year-dropdown-1', 'value')]
)
def update_residence_graph(selected_country, selected_years):
    if not selected_country or not selected_years:
        return go.Figure(
            layout={'title': "Select a country and year(s) to view the data"}
        )

    filtered_data = final_dataset[
        (final_dataset['Country'] == selected_country) & (final_dataset['Year'].isin(selected_years))
    ]

    if filtered_data.empty:
        return go.Figure(
            layout={'title': f"No data available for {selected_country} in selected year(s)"}
        )

    combined_data = filtered_data.groupby(['Year', 'Residence / Facility Type']).agg({'Coverage': 'mean'}).reset_index()
    pivot_data = combined_data.pivot(index='Residence / Facility Type', columns='Year', values='Coverage').fillna(0)

    fig = go.Figure()

    for year in selected_years:
        fig.add_trace(go.Bar(
            x=pivot_data.index,
            y=pivot_data[year],
            name=str(year),
        ))

    fig.update_layout(
        title=f"Coverage for {selected_country} ({', '.join(map(str, selected_years))})",
        xaxis_title="Residence / Facility Type",
        yaxis_title="Coverage (%)",
        barmode="group"
    )
    return fig

app1.run_server(mode="inline", port=8050)
C:\Users\Bhargavi Jahagirdar\anaconda3\Lib\site-packages\dash\dash.py:579: UserWarning:

JupyterDash is deprecated, use Dash instead.
See https://dash.plotly.com/dash-in-jupyter for more details.

Considering Ghana as an example, we can ssee that the coverage is higher for Urban faciltiy type than Rural facility type. Considering another example, we see that Bhutan has equal coverage for both the facility types. Through this visualization, we can dig deep on the coverage for each country individually

What is the distribution of population for each Service Type?¶

In [13]:
JupyterDash._server_threads.clear()
from jupyter_dash import JupyterDash
from dash import dcc, html, Input, Output
import plotly.graph_objects as go
import pandas as pd

countries = [{'label': country, 'value': country} for country in final_dataset['Country'].unique()]
years = [{'label': year, 'value': year} for year in sorted(final_dataset['Year'].unique())]
residence_types = [{'label': res, 'value': res} for res in final_dataset['Residence / Facility Type'].unique()]

service_type_colors = {
    service_type: color
    for service_type, color in zip(
        final_dataset['Service Type'].unique(),
        ['#636EFA', '#EF553B', '#00CC96', '#AB63FA', '#FFA15A', '#19D3F3', '#FF6692', '#B6E880']
    )
}

app = JupyterDash(name='ServiceLevels')

app.layout = html.Div([
    html.H1("Population Distribution by Service Types", style={'textAlign': 'center'}),

    html.Div([
        html.Div([
            html.Label("Select a Country:"),
            dcc.Dropdown(
                id='country-dropdown',
                options=countries,
                placeholder="Select a Country",
                style={'width': '90%'}
            )
        ], style={'width': '30%', 'display': 'inline-block', 'verticalAlign': 'top'}),

        html.Div([
            html.Label("Select Year(s):"),
            dcc.Dropdown(
                id='year-dropdown',
                options=years,
                placeholder="Select Year(s)",
                multi=True,
                style={'width': '90%'}
            )
        ], style={'width': '30%', 'display': 'inline-block', 'verticalAlign': 'top'}),

        html.Div([
            html.Label("Select Residence / Facility Type:"),
            dcc.Dropdown(
                id='residence-dropdown',
                options=residence_types,
                placeholder="Select Residence / Facility Type",
                multi=True,
                style={'width': '90%'}
            )
        ], style={'width': '30%', 'display': 'inline-block', 'verticalAlign': 'top'}),
    ], style={'marginBottom': '20px'}),

    # Graphs for each service level
    html.Div([
        dcc.Graph(id='no-service-graph', style={'display': 'inline-block', 'width': '48%'}),
        dcc.Graph(id='limited-service-graph', style={'display': 'inline-block', 'width': '48%'}),
    ]),
    html.Div([
        dcc.Graph(id='basic-service-graph', style={'display': 'inline-block', 'width': '48%'}),
        dcc.Graph(id='insufficient-service-graph', style={'display': 'inline-block', 'width': '48%'}),
    ]),
])

def create_graph(filtered_data, service_level, default_title):
    data = filtered_data[filtered_data['Service level'] == service_level]

    if data.empty:
        return go.Figure(layout={'title': f"No data available for {service_level}"})

    grouped_data = data.groupby('Service Type')['Population'].sum().reset_index()
    grouped_data = grouped_data[grouped_data['Population'] > 0]  # Exclude zero population rows

    fig = go.Figure(
        data=[
            go.Bar(
                x=grouped_data['Service Type'],
                y=grouped_data['Population'],
                marker_color=[service_type_colors[stype] for stype in grouped_data['Service Type']]
            )
        ],
        layout={
            'title': default_title,
            'xaxis_title': 'Service Type',
            'yaxis_title': 'Population'
        }
    )
    return fig

@app.callback(
    [Output('no-service-graph', 'figure'),
     Output('limited-service-graph', 'figure'),
     Output('basic-service-graph', 'figure'),
     Output('insufficient-service-graph', 'figure')],
    [Input('country-dropdown', 'value'),
     Input('year-dropdown', 'value'),
     Input('residence-dropdown', 'value')]
)
def update_graphs(selected_country, selected_years, selected_residence):

    filtered_data = final_dataset.copy()
    if selected_country:
        filtered_data = filtered_data[filtered_data['Country'] == selected_country]
    if selected_years:
        filtered_data = filtered_data[filtered_data['Year'].isin(selected_years)]
    if selected_residence:
        filtered_data = filtered_data[filtered_data['Residence / Facility Type'].isin(selected_residence)]

    no_service_fig = create_graph(filtered_data, 'No service', 'Population Distribution: No Service Level')
    limited_service_fig = create_graph(filtered_data, 'Limited service', 'Population Distribution: Limited Service Level')
    basic_service_fig = create_graph(filtered_data, 'Basic service', 'Population Distribution: Basic Service')
    insufficient_service_fig = create_graph(filtered_data, 'Insufficient data', 'Population Distribution: Insufficient data Level')

    return no_service_fig, limited_service_fig, basic_service_fig, insufficient_service_fig

app.run_server(mode="inline", port=8056)

How has the coverage of service types evolved over time for a country or globally?¶

In [14]:
JupyterDash._server_threads.clear()

year_to_visualize = [2019, 2020, 2021, 2022]
combined_data = final_dataset[final_dataset['Year'].isin(year_to_visualize)].groupby(['Year', 'Service Type']).agg({'Coverage': 'mean'}).reset_index()

pivot_data = combined_data.pivot(index='Service Type', columns='Year', values='Coverage')

for year in year_to_visualize:
    pivot_data[f'{year}_Diffs'] = [
        ", ".join(
            f"{year_prev}: {pivot_data[year][i] - pivot_data[year_prev][i]:.2f}"
            for year_prev in year_to_visualize if year_prev < year
        )
        if year > year_to_visualize[0] else "N/A"
        for i in range(len(pivot_data))
    ]

base_color = 'Blues'

fig = go.Figure()

x = pivot_data.index

for i, year in enumerate(year_to_visualize):

    tooltip_text = [
        f"Year: {year}<br>Coverage: {pivot_data[year][service]:.2f}%<br>"
        f"Differences from preceding years: {pivot_data[f'{year}_Diffs'][service]}"
        for service in pivot_data.index
    ]

    fig.add_trace(go.Bar(
        x=x,
        y=pivot_data[year],
        name=str(year),
        hoverinfo="text",
        hovertext=tooltip_text,
        marker=dict(color=i / len(year_to_visualize), colorscale=base_color)
    ))

fig.update_layout(
    title="Global Average Coverage of Service Types by Year (with Differences from Preceding Years)",
    xaxis_title="Service Type",
    yaxis_title="Average Coverage (%)",
    barmode="group",
    xaxis_tickangle=-45,
    legend_title="Year",
    template="plotly_white"
)

fig.show()
Environmental cleaningHealth care wasteHygieneSanitationWater0102030405060
Year2019202020212022Global Average Coverage of Service Types by Year (with Differences from Preceding Years)Service TypeAverage Coverage (%)
plotly-logomark
In [15]:
JupyterDash._server_threads.clear()

app2 = JupyterDash(name='ServiceTypeCoverage')

app2.layout = html.Div([
    html.H1("Coverage of Service Types by Year", style={'textAlign': 'center'}),

    html.Div([
        dcc.Dropdown(
            id='country-dropdown-2',
            options=countries,
            placeholder="Select a Country",
            style={'width': '48%', 'display': 'inline-block', 'margin-right': '2%'}
        ),
        dcc.Dropdown(
            id='year-dropdown-2',
            options=years,
            placeholder="Select Year(s)",
            multi=True,
            style={'width': '48%', 'display': 'inline-block'}
        )
    ]),

    dcc.Graph(id='service-graph')
])

@app2.callback(
    Output('service-graph', 'figure'),
    [Input('country-dropdown-2', 'value'),
     Input('year-dropdown-2', 'value')]
)
def update_service_graph(selected_country, selected_years):
    if not selected_country or not selected_years:
        return go.Figure(
            layout={'title': "Select a country and year(s) to view the data"}
        )

    filtered_data = final_dataset[
        (final_dataset['Country'] == selected_country) & (final_dataset['Year'].isin(selected_years))
    ]

    if filtered_data.empty:
        return go.Figure(
            layout={'title': f"No data available for {selected_country} in selected year(s)"}
        )

    combined_data = filtered_data.groupby(['Year', 'Service Type']).agg({'Coverage': 'mean'}).reset_index()
    pivot_data = combined_data.pivot(index='Service Type', columns='Year', values='Coverage').fillna(0)

    fig = go.Figure()

    for year in selected_years:
        fig.add_trace(go.Bar(
            x=pivot_data.index,
            y=pivot_data[year],
            name=str(year),
        ))

    fig.update_layout(
        title=f"Coverage for {selected_country} ({', '.join(map(str, selected_years))})",
        xaxis_title="Service Type",
        yaxis_title="Coverage (%)",
        barmode="group"
    )
    return fig

app2.run_server(mode="inline", port=8051)
C:\Users\Bhargavi Jahagirdar\anaconda3\Lib\site-packages\dash\dash.py:579: UserWarning:

JupyterDash is deprecated, use Dash instead.
See https://dash.plotly.com/dash-in-jupyter for more details.

What is the geographic distribution of coverage for a specific service in a specific year?¶

In [16]:
import plotly.express as px
JupyterDash._server_threads.clear()

years = [{'label': year, 'value': year} for year in final_dataset['Year'].unique()]
service_types = [{'label': service, 'value': service} for service in final_dataset['Service Type'].unique()]

app3 = JupyterDash("Viz3")

app3.layout = html.Div([
    html.H1("Coverage of Services", style={'textAlign': 'center'}),

    html.Div([
        dcc.Dropdown(
            id='year-dropdown',
            options=years,
            placeholder="Select a Year",
            style={'width': '48%', 'display': 'inline-block', 'margin-right': '2%'}
        ),
        dcc.Dropdown(
            id='service-type-dropdown',
            options=service_types,
            placeholder="Select a Service Type",
            style={'width': '48%', 'display': 'inline-block'}
        )
    ]),

    dcc.Graph(id='choropleth-map')
])

@app3.callback(
    Output('choropleth-map', 'figure'),
    [Input('year-dropdown', 'value'),
     Input('service-type-dropdown', 'value')]
)
def update_choropleth(year, service_type):
    if not year or not service_type:
        return px.scatter(title="Select both year and service type to view the map")

    map_df = final_dataset[(final_dataset['Year'] == year) & (final_dataset['Service Type'] == service_type)]
    if map_df.empty:
        return px.scatter(title=f"No data available for {service_type} in {year}")

    fig = px.choropleth(map_df, locations="ISO3", color="Coverage",
                        hover_name="Country",
                        title=f"Coverage of {service_type} in {year}",
                        color_continuous_scale="viridis")
    return fig

app3.run_server(mode='inline', port=8052)
C:\Users\Bhargavi Jahagirdar\anaconda3\Lib\site-packages\dash\dash.py:579: UserWarning:

JupyterDash is deprecated, use Dash instead.
See https://dash.plotly.com/dash-in-jupyter for more details.

What is the geographic distribution of coverage for a facility/ residence type in a specific year?¶

In [17]:
JupyterDash._server_threads.clear()

years = [{'label': year, 'value': year} for year in final_dataset['Year'].unique()]
facility_types = [{'label': facility, 'value': facility} for facility in final_dataset['Residence / Facility Type'].unique()]

app4 = JupyterDash(__name__)

app4.layout = html.Div([
    html.H1("Dynamic Choropleth Map for Facility and Residence Type", style={'textAlign': 'center'}),

    html.Div([
        dcc.Dropdown(
            id='year-dropdown',
            options=years,
            placeholder="Select a Year",
            style={'width': '48%', 'display': 'inline-block', 'margin-right': '2%'}
        ),
        dcc.Dropdown(
            id='facility-type-dropdown',
            options=facility_types,
            placeholder="Select a Facility/Residence Type",
            style={'width': '48%', 'display': 'inline-block'}
        )
    ]),

    dcc.Graph(id='choropleth-map')
])

@app4.callback(
    Output('choropleth-map', 'figure'),
    [Input('year-dropdown', 'value'),
     Input('facility-type-dropdown', 'value')]
)
def update_choropleth(year, facility_type):
    if not year or not facility_type:
        return px.scatter(title="Select both year and facility type to view the map")

    map_df = final_dataset[(final_dataset['Year'] == year) & (final_dataset['Residence / Facility Type'] == facility_type)]
    if map_df.empty:
        return px.scatter(title=f"No data available for {facility_type} in {year}")

    fig = px.choropleth(map_df, locations="ISO3", color="Coverage",
                        hover_name="Country",
                        title=f"Coverage for {facility_type} in {year}",
                        color_continuous_scale="viridis")  # Changed to Viridis
    return fig

app4.run_server(mode='inline', port=8053)

What are the trends in service coverage across different service types over the years?¶

In [18]:
temp_df = final_dataset.groupby(['Year', 'Service Type'])['Coverage'].mean().reset_index()

fig = px.line(
    temp_df,
    x='Year',
    y='Coverage',
    color='Service Type',
    markers=True,
    title='Temporal Trends in Service Coverage',
    labels={
        'Year': 'Year',
        'Coverage': 'Average Coverage (%)',
        'Service Type': 'Service Type'
    },
    hover_data={
        'Year': True,
        'Coverage': ':.2f',
        'Service Type': True
    }
)

fig.update_xaxes(
    tickmode='array',
    tickvals=[2019, 2020, 2021, 2022],
    range=[2018.5, 2022.5]
)

fig.update_layout(
    title={'font': {'size': 16}},
    xaxis_title={'font': {'size': 12}},
    yaxis_title={'font': {'size': 12}},
    legend_title={'font': {'size': 12}},
    template='seaborn'
)

fig.show()
201920202021202250556065
Service TypeEnvironmental cleaningHealth care wasteHygieneSanitationWaterTemporal Trends in Service CoverageYearAverage Coverage (%)
plotly-logomark

How do different countries compare in terms of the percentage of population covered by basic or limited and no service level for different service types?¶

In [19]:
import pandas as pd
import plotly.express as px
from dash import Dash, dcc, html, Input, Output

# Ensure 'Year' is numeric
final_dataset['Year'] = final_dataset['Year'].astype(int)

# Group and reshape the data (do not calculate the coverage yet)
df_grouped = final_dataset.groupby(['Year', 'Country', 'Service Type', 'Service level'])['Coverage'].mean().reset_index()

# Create a Dash App
app = Dash(__name__)

app.layout = html.Div([
    html.H1("Population Coverage by Service Levels"),
    html.Label("Select Year:"),
    dcc.Dropdown(
        id='year-dropdown',
        options=[{'label': year, 'value': year} for year in sorted(df_grouped['Year'].unique())],
        value=sorted(df_grouped['Year'].unique())[0],  # Default to the first year
        placeholder="Select a year..."
    ),
    html.Label("Select Country:"),
    dcc.Dropdown(
        id='country-dropdown',
        options=[{'label': country, 'value': country} for country in df_grouped['Country'].unique()],
        multi=True,
        value=df_grouped['Country'].unique()[:3],  # Default selected countries
        placeholder="Select countries..."
    ),
    dcc.Graph(id='bar-chart')
])

@app.callback(
    Output('bar-chart', 'figure'),
    [Input('year-dropdown', 'value'),
     Input('country-dropdown', 'value')]
)
def update_chart(selected_year, selected_countries):
    # Filter data for the selected year and countries
    filtered_data = df_grouped[
        (df_grouped['Year'] == selected_year) &
        (df_grouped['Country'].isin(selected_countries))
    ]

    # Calculate coverage after filtering
    coverage_per_year = final_dataset[
        (final_dataset['Year'] == selected_year) &
        (final_dataset['Country'].isin(selected_countries))
    ].groupby(['Year', 'Country', 'Service Type', 'Service level'])['Coverage'].mean().reset_index()

    # Generate simple bar chart
    fig = px.bar(
        coverage_per_year,
        x='Country',
        y='Coverage',
        color='Service level',
        facet_col='Service Type',
        title=f"Population Coverage by Service Levels for {selected_year}",
        labels={'Coverage': 'Percentage Coverage'},
        barmode='group',
        height=600
    )

    # Remove "Service Type=" from facet headers
    for annotation in fig.layout.annotations:
        if "Service Type=" in annotation.text:
            annotation.text = annotation.text.replace("Service Type=", "")

    # Update layout for better visualization
    fig.update_layout(
        xaxis_tickangle=45,
        legend_title="Service Level",
        title_font_size=20
    )
    return fig

if __name__ == '__main__':
    app.run_server(debug=True, port=8058)
In [20]:
import pandas as pd
import plotly.graph_objects as go
from dash import Dash, dcc, html, Input, Output

# Ensure 'Year' is numeric
final_dataset['Year'] = final_dataset['Year'].astype(int)

# Create a Dash App
app = Dash(__name__)

# Layout for the Dash app
app.layout = html.Div([
    html.H1("Coverage by Service Type, Service Level, and Country"),
    html.Label("Select Year:"),
    dcc.Dropdown(
        id='year-dropdown',
        options=[{'label': year, 'value': year} for year in sorted(final_dataset['Year'].unique())],
        value=sorted(final_dataset['Year'].unique())[0],  # Default to the first year
        placeholder="Select a year..."
    ),
    html.Label("Select Service Level:"),
    dcc.Dropdown(
        id='service-level-dropdown',
        options=[{'label': level, 'value': level} for level in final_dataset['Service level'].unique()],
        value=final_dataset['Service level'].unique()[0],  # Default to the first service level
        placeholder="Select a service level..."
    ),
    html.Label("Select Service Type:"),
    dcc.Dropdown(
        id='service-type-dropdown',
        options=[{'label': service_type, 'value': service_type} for service_type in final_dataset['Service Type'].unique()],
        value=final_dataset['Service Type'].unique()[0],  # Default to the first service type
        placeholder="Select a service type..."
    ),
    dcc.Graph(id='choropleth-map')
])

@app.callback(
    Output('choropleth-map', 'figure'),
    [Input('year-dropdown', 'value'),
     Input('service-level-dropdown', 'value'),
     Input('service-type-dropdown', 'value')]
)
def update_map(selected_year, selected_service_level, selected_service_type):
    # Filter data for the selected year, service level, and service type
    filtered_data = final_dataset[
        (final_dataset['Year'] == selected_year) &
        (final_dataset['Service level'] == selected_service_level) &
        (final_dataset['Service Type'] == selected_service_type)
    ]

    # Create a dictionary to map countries to coverage
    country_coverage = filtered_data.groupby('Country')['Coverage'].mean().reset_index()

    # Create the choropleth map
    fig = go.Figure(go.Choropleth(
        locations=country_coverage['Country'],
        locationmode='country names',
        z=country_coverage['Coverage'],
        hoverinfo='location+z',
        colorbar_title="Coverage",
        colorscale="Viridis"
    ))

    # Update the layout of the map
    fig.update_layout(
        title=f"Coverage for {selected_service_type} - {selected_service_level} ({selected_year})",
        geo=dict(showcoastlines=True, coastlinecolor="Black", projection_type="mercator"),
        title_font_size=20,
        width=900,
        height=500
    )

    return fig

if __name__ == '__main__':
    app.run_server(debug=True, port= 8059)

Correlation between population size and coverage by Service Type¶

  • Is there any positive/negative correlation between population size and coverage?

  • what is the overall correlation by each service type

In [21]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset
data = final_dataset

# Convert Population column to more understandable form (e.g., millions)
#data['Population'] = data['Population'] / 1e6

# Get unique service types
service_types = data['Service Type'].unique()

# Plot a scatter plot for each service type
for service_type in service_types:
    plt.figure(figsize=(12, 8))
    service_data = data[data['Service Type'] == service_type]
    sns.scatterplot(
        data=service_data,
        x='Population',
        y='Coverage',
        hue='Service Type',
        palette='viridis',
        s=100,
        alpha=0.8
    )

    # Add labels and title
    plt.title(f'Correlation Between Population Size and Coverage for {service_type}', fontsize=16)
    plt.xlabel('Population Size (in millions)', fontsize=14)
    plt.ylabel('Percentage Coverage', fontsize=14)
    plt.grid(True)

    # Show the plot
    plt.show()

# Optionally, calculate the correlation coefficient for each service type
for service_type in service_types:
    service_data = data[data['Service Type'] == service_type]
    correlation = service_data[['Population', 'Coverage']].corr().iloc[0, 1]
    print(f"Correlation coefficient for {service_type}: {correlation:.2f}")
Correlation coefficient for Environmental cleaning: 0.15
Correlation coefficient for Hygiene: 0.15
Correlation coefficient for Sanitation: 0.15
Correlation coefficient for Water: 0.17
Correlation coefficient for Health care waste: 0.16

Has population under each service type has improved / deteriorated over time? What is the trend in overall population access to under each service level?¶

In [22]:
import pandas as pd
import matplotlib.pyplot as plt

# Load the dataset
data = final_dataset

# Convert Population column to more understandable form (e.g., millions)
#data['Population'] = data['Population'] / 1e6

# Ensure the Year column is treated as an integer
data['Year'] = data['Year'].astype(int)

# Filter data for the relevant service types and service level
sanitation_water_data = data[(data['Service Type'].isin(['Sanitation', 'Water', 'Hygiene', 'Environmental cleaning', 'Health care waste'])) & (data['Service level'] == 'No service')]

# Group data by year and service type, summing population
grouped_data = sanitation_water_data.groupby(['Year', 'Service Type'])['Population'].sum().unstack(fill_value=0)

# Plot trends over time
plt.figure(figsize=(12, 8))
grouped_data.plot(kind='line', marker='o', figsize=(14, 8), linewidth=2)

# Dynamically set x-axis ticks based on unique years
plt.xticks(ticks=grouped_data.index, labels=grouped_data.index.astype(str), fontsize=12)

# Add labels and title
plt.title('Population Without Access to Service (No Service) by year', fontsize=16)
plt.xlabel('Year', fontsize=14)
plt.ylabel('Population (in millions)', fontsize=14)
plt.legend(title='Service Type', fontsize=12)
plt.grid(True)

# Show the plot
plt.tight_layout()
plt.show()
<Figure size 1200x800 with 0 Axes>

How do population density and GDP per capita correlate with the availability and coverage of service types in different countries?¶

Data tidying and cleaning¶

In [23]:
final_dataset.head()
Out[23]:
ISO3 Country Residence / Facility Type Service Type Year Coverage Population Service level Indicator Name GDP per capita (current US$)
0 AFG Afghanistan total Environmental cleaning 2019 84.00000 3.172638e+07 Basic service GDP per capita (current US$) 500.522981
1 AFG Afghanistan hospital Environmental cleaning 2019 79.11322 2.988067e+07 Basic service GDP per capita (current US$) 500.522981
2 AFG Afghanistan non_hospital Environmental cleaning 2019 81.84787 3.091353e+07 Basic service GDP per capita (current US$) 500.522981
3 AFG Afghanistan hospital Hygiene 2019 28.72340 1.084868e+07 Basic service GDP per capita (current US$) 500.522981
4 AFG Afghanistan total Sanitation 2019 2.50000 9.442375e+05 Basic service GDP per capita (current US$) 500.522981

Dataset with Residence / Facility Type = total

In [24]:
total_residence_data = final_dataset[final_dataset['Residence / Facility Type'] == 'total']
total_residence_data.head()
Out[24]:
ISO3 Country Residence / Facility Type Service Type Year Coverage Population Service level Indicator Name GDP per capita (current US$)
0 AFG Afghanistan total Environmental cleaning 2019 84.0 31726380.0 Basic service GDP per capita (current US$) 500.522981
4 AFG Afghanistan total Sanitation 2019 2.5 944237.5 Basic service GDP per capita (current US$) 500.522981
5 AFG Afghanistan total Water 2019 79.0 29837905.0 Basic service GDP per capita (current US$) 500.522981
8 AFG Afghanistan total Health care waste 2019 82.0 30970990.0 Basic service GDP per capita (current US$) 500.522981
12 AFG Afghanistan total Sanitation 2019 92.0 34747940.0 Limited service GDP per capita (current US$) 500.522981

Creating a copy of the main dataset

In [25]:
from copy import deepcopy
total_residence_data_copy = deepcopy(total_residence_data)
total_residence_data_copy.head()
Out[25]:
ISO3 Country Residence / Facility Type Service Type Year Coverage Population Service level Indicator Name GDP per capita (current US$)
0 AFG Afghanistan total Environmental cleaning 2019 84.0 31726380.0 Basic service GDP per capita (current US$) 500.522981
4 AFG Afghanistan total Sanitation 2019 2.5 944237.5 Basic service GDP per capita (current US$) 500.522981
5 AFG Afghanistan total Water 2019 79.0 29837905.0 Basic service GDP per capita (current US$) 500.522981
8 AFG Afghanistan total Health care waste 2019 82.0 30970990.0 Basic service GDP per capita (current US$) 500.522981
12 AFG Afghanistan total Sanitation 2019 92.0 34747940.0 Limited service GDP per capita (current US$) 500.522981

Creating an aggregate of the coverage and total population based on rows that have similar ISO3 code, Service Type and Year thus compiling the results for various kinds of service level

In [26]:
total_residence_data_copy['total_coverage_population'] = (total_residence_data_copy['Coverage'] * total_residence_data_copy['Population']) / 100


# Group by ISO3, Country, Service Type, and Year and calculate required aggregations
aggregated_data = total_residence_data_copy.groupby(['ISO3', 'Country', 'Service Type', 'Year', 'GDP per capita (current US$)']).agg(
    total_population=('Population', 'sum'),
    total_coverage_population=('total_coverage_population', 'sum')
).reset_index()


# Calculate the Coverage column in the new DataFrame
aggregated_data['Coverage'] = (aggregated_data['total_coverage_population'] * 100 / aggregated_data['total_population']).round(1)

# Rename the total_population column to Population
aggregated_data.rename(columns={'total_population': 'Population'}, inplace=True)

# Select and reorder columns for the final DataFrame
cleaned_dataset = aggregated_data[['ISO3', 'Country', 'Service Type', 'Year', 'Coverage', 'Population', 'GDP per capita (current US$)']]

cleaned_dataset.head()
Out[26]:
ISO3 Country Service Type Year Coverage Population GDP per capita (current US$)
0 AFG Afghanistan Environmental cleaning 2019 73.1 37769500.0 500.522981
1 AFG Afghanistan Environmental cleaning 2020 73.1 38972232.0 516.866797
2 AFG Afghanistan Environmental cleaning 2021 73.1 40099464.0 363.674087
3 AFG Afghanistan Environmental cleaning 2022 73.1 41128772.0 353.000000
4 AFG Afghanistan Health care waste 2019 70.5 37769500.0 500.522981
In [27]:
from jupyter_dash import JupyterDash
from dash import dcc, html, Input, Output
import plotly.graph_objects as go
import pandas as pd

JupyterDash._server_threads.clear()

Visualization for population vs coverage

Dividing the graph into top 20 and bottom 50 to get a better view of the data

In [28]:
from jupyter_dash import JupyterDash
from dash import dcc, html, Input, Output
import plotly.express as px
import pandas as pd

# Initialize the app
app = JupyterDash(__name__)

# Dropdown options
service_types = [{'label': service, 'value': service} for service in cleaned_dataset['Service Type'].unique()]

# App layout
app.layout = html.Div([
    html.H1("Interactive Coverage Analysis", style={'textAlign': 'center'}),

    # Service Type dropdown and Year slider
    html.Div([
        dcc.Dropdown(
            id='service-type-dropdown',
            options=service_types,
            value=cleaned_dataset['Service Type'].unique()[0],  # Default to the first service type
            placeholder="Select a Service Type",
            style={'width': '48%', 'display': 'inline-block', 'margin-right': '2%'}
        ),
        dcc.Slider(
            id='year-slider',
            min=cleaned_dataset['Year'].min(),
            max=cleaned_dataset['Year'].max(),
            step=1,
            value=cleaned_dataset['Year'].min(),
            marks={year: str(year) for year in range(cleaned_dataset['Year'].min(), cleaned_dataset['Year'].max() + 1)},
            tooltip={"placement": "bottom", "always_visible": True},
        )
    ]),

    # Graph containers
    dcc.Graph(id='top-20-population-graph'),
    dcc.Graph(id='least-50-population-graph'),
])

# Callback to update graphs
@app.callback(
    [Output('top-20-population-graph', 'figure'),
     Output('least-50-population-graph', 'figure')],
    [Input('service-type-dropdown', 'value'),
     Input('year-slider', 'value')]
)
def update_graphs(selected_service_type, selected_year):
    # Filter data by year and service type
    filtered_data = cleaned_dataset[(cleaned_dataset['Year'] == selected_year) &
                                  (cleaned_dataset['Service Type'] == selected_service_type)]

    # Sort by population
    sorted_data = filtered_data.sort_values(by='Population', ascending=False)

    # Top 20 countries with the highest population
    top_20_data = sorted_data.head(20)
    fig1 = px.scatter(
        top_20_data,
        x='Coverage',
        y='Population',
        color='Country',
        size='Population',
        hover_name='Country',
        title=f"Top 20 Countries by Population for {selected_service_type} in {selected_year}",
        labels={'Coverage': 'Coverage (%)', 'Population': 'Population'}
    )
    fig1.update_layout(legend_title_text="Country", legend_orientation="v")

    # Bottom 50 countries with the least population
    least_50_data = sorted_data.tail(50)
    fig2 = px.scatter(
        least_50_data,
        x='Coverage',
        y='Population',
        color='Country',
        size='Population',
        hover_name='Country',
        title=f"50 Countries with Least Population for {selected_service_type} in {selected_year}",
        labels={'Coverage': 'Coverage (%)', 'Population': 'Population'}
    )
    fig2.update_layout(legend_title_text="Country", legend_orientation="v")

    return fig1, fig2

# Run the app
app.run_server(mode='inline', port=8054)
C:\Users\Bhargavi Jahagirdar\anaconda3\Lib\site-packages\dash\dash.py:579: UserWarning:

JupyterDash is deprecated, use Dash instead.
See https://dash.plotly.com/dash-in-jupyter for more details.

GDP vs Coverage

In [29]:
from jupyter_dash import JupyterDash
from dash import dcc, html, Input, Output
import plotly.express as px
import pandas as pd

# Initialize the app
app4 = JupyterDash(__name__)

# Dropdown options
service_types = [{'label': service, 'value': service} for service in cleaned_dataset['Service Type'].unique()]

# App layout
app4.layout = html.Div([
    html.H1("Interactive GDP vs. Coverage Analysis", style={'textAlign': 'center'}),

    # Service Type dropdown and Year slider
    html.Div([
        dcc.Dropdown(
            id='service-type-dropdown',
            options=service_types,
            value=cleaned_dataset['Service Type'].unique()[0],  # Default to the first service type
            placeholder="Select a Service Type",
            style={'width': '48%', 'display': 'inline-block', 'margin-right': '2%'}
        ),
        dcc.Slider(
            id='year-slider',
            min=cleaned_dataset['Year'].min(),
            max=cleaned_dataset['Year'].max(),
            step=1,
            value=cleaned_dataset['Year'].min(),
            marks={year: str(year) for year in range(cleaned_dataset['Year'].min(), cleaned_dataset['Year'].max() + 1)},
            tooltip={"placement": "bottom", "always_visible": True},
        )
    ]),

    # Graph containers
    dcc.Graph(id='top-20-gdp-graph'),
    dcc.Graph(id='least-50-gdp-graph'),
])

# Callback to update graphs
@app4.callback(
    [Output('top-20-gdp-graph', 'figure'),
     Output('least-50-gdp-graph', 'figure')],
    [Input('service-type-dropdown', 'value'),
     Input('year-slider', 'value')]
)
def update_graphs(selected_service_type, selected_year):
    # Filter data by year and service type
    filtered_data = cleaned_dataset[(cleaned_dataset['Year'] == selected_year) &
                                  (cleaned_dataset['Service Type'] == selected_service_type)]

    # Sort by GDP
    sorted_data = filtered_data.sort_values(by='GDP per capita (current US$)', ascending=False)

    # Top 20 countries with the highest GDP
    top_20_data = sorted_data.head(20)
    fig1 = px.scatter(
        top_20_data,
        x='Coverage',
        y='GDP per capita (current US$)',
        color='Country',
        size='GDP per capita (current US$)',
        hover_name='Country',
        title=f"Top 20 Countries by GDP for {selected_service_type} in {selected_year}",
        labels={'Coverage': 'Coverage (%)', 'GDP': 'GDP (in billions)'}
    )
    fig1.update_layout(legend_title_text="Country", legend_orientation="v")

    # Bottom 50 countries with the least GDP
    least_50_data = sorted_data.tail(50)
    fig2 = px.scatter(
        least_50_data,
        x='Coverage',
        y='GDP per capita (current US$)',
        color='Country',
        size='GDP per capita (current US$)',
        hover_name='Country',
        title=f"50 Countries with Least GDP for {selected_service_type} in {selected_year}",
        labels={'Coverage': 'Coverage (%)', 'GDP': 'GDP (in billions)'}
    )
    fig2.update_layout(legend_title_text="Country", legend_orientation="v")

    return fig1, fig2

# Run the app
app4.run_server(mode='inline', port=8055)
C:\Users\Bhargavi Jahagirdar\anaconda3\Lib\site-packages\dash\dash.py:579: UserWarning:

JupyterDash is deprecated, use Dash instead.
See https://dash.plotly.com/dash-in-jupyter for more details.